Predicting Web Information Content

نویسندگان

  • Tingshao Zhu
  • Russell Greiner
  • Gerald Häubl
  • Robert Price
چکیده

This paper introduces a novel method for predicting the current information need of a web user from the content of the pages the user has visited and the actions the user has applied to these pages. This inference is based on a parameterized model of how the sequence of actions chosen by the user indicates the degree to which page content satisfies the user’s information need. We show that the model parameters can be estimated using standard methods from a labelled corpus. Data from lab experiments demonstrate that the prediction model can effectively identify the information needs of new users, browsing previously unseen pages. The paper concludes with an overview of our “complete-web” recommendation system, WebIC, which uses the prediction model to recommend useful pages to the user, from anywhere on the Web.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Effective Learning to Rank Persian Web Content

Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a ...

متن کامل

Analyzing new features of infected web content in detection of malicious web pages

Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Web-based Information for Medical Tourism: Case Study of AriaMedTour Medical Tourism Company, Iran

Objective: As one of the well-known countries for medical tourism, Iran has the potential for growth in this industry and requires information and advertisements in online media and websites. This study aims to investigate the effectiveness of the content produced by the website of AriaMedTour Medical Tourism Company in informing tourists. Methods: This is an applied study that adopted an indu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003